197 research outputs found
Recommended from our members
Zur Theorie künstlicher neuronaler Netze
Zur Theorie künstlicher neuronaler Netze wird aus vier Gebieten beigetragen: der Informatik mit einem neuen Lernverfahren (stabile Parameteradaption), der Mathematik mit der Analyse der Struktur des Gewichtungsraums, der Statistik mit einem neuen Schätzer für die Güte von Netzen (Clustered bootstrap) und der Physik mit effizienten Lern- und Schliesalgorithmen für dezimierbare Boltzmann-Maschinen.
Es werden Abbildungsnetze definiert, deren Kettenregel abgeleitet und in mehrere berechtigte algorithmische Varianten gefast, Backpropagation-Netze definiert, der Backpropagation-Algorithmus in einer möglichst allgemeinen Fassung dargestellt und demonstriert, wie dieser Rahmen auch auf rekurrente Netze angewendet werden kann.
Die Grenzen der Methode des Gradientenabstiegs werden aufgezeigt und bekannte alternative Verfahren kritisch dargestellt. Ausgehend davon wird unter den Gesichts- punkten Effizienz und Stabilität eine Klasse neuer miteinander verwandter Optimierungsalgorithmen entwickelt, deren theoretische Leistungsfähigkeit von einem Beweis der Konvergenz erster Ordnung abgesichert wird. Es ist möglich, Zweite-Ordnung-Information in das neue Verfahren einfliesen zu lassen. Empirische Vergleiche unter- mauern dessen Effizienz. Die Grenzen von Optimierungsverfahren werden diskutiert.
Danach wird Lernen in neuronalen Netzen als statistisches Schätzproblem aufgefast. Die Güte der Schätzung kann mit bekannten statistischen Verfahren berechnet wer- den. Es wird nachgewiesen, das durch Unzulänglichkeiten neuronalen Lernens die Angaben zur Güte nicht robust oder zu ungenau sind.
Das Bestreben, diese Unzulänglichkeiten herauszufiltern, führt auf eine neue theoretische Sichtweise des Gewichtungsraums. Er mus in natürlicher Weise als Mannigfaltigkeit verstanden werden. Es zeigt sich, das die Berechnung der kanonischen Metrik im Gewichtungsraum NP-hart ist. Zugleich wird nachgewiesen, das eine effiziente Approximation der Metrik möglich ist. Damit ist es möglich, Lernergebnisse im Gewichtungsraum zu clustern und zu visualisieren. Als eine weitere Anwendung dieser Theorie wird ein robustes Verfahren der Modellauswahl vorgestellt und an einem Beispiel vorgeführt. Schlieslich kann auch das im vorigen Absatz gestellte Problem durch ein neues Verfahren gelöst werden.
Die physikalisch motivierte Boltzmann-Maschine wird dargestellt, und es wird argumentiert, warum hier das Schliesen NP-hart ist. Dies motiviert eine Beschr¨ankung auf die genügend interessante Klasse der dezimierbaren Boltzmann-Maschinen. Eine neue Dezimierungsregel wird eingef¨uhrt und gezeigt, das es keine weiteren gibt. Dezimierbare Boltzmann-Maschinen werden mit Mitteln der Wahrscheinlichkeitstheorie studiert und effiziente Lernalgorithmen vorgeschlagen. Die Gewichtungsraumstruktur kann auch hier erfolgreich ausgenutzt werden, was eine Anwendung demonstriert
Info Navigator: A visualization tool for document searching and browsing
In this paper we investigate the retrieval performance of monophonic and polyphonic queries made on a polyphonic music database. We extend the n-gram approach for full-music indexing of monophonic music data to polyphonic music using both rhythm and pitch information. We define an experimental framework for a comparative and fault-tolerance study of various n-gramming strategies and encoding levels. For monophonic queries, we focus in particular on query-by-humming systems, and for polyphonic queries on query-by-example. Error models addressed in several studies are surveyed for the fault-tolerance study. Our experiments show that different n-gramming strategies and encoding precision differ widely in their effectiveness. We present the results of our study on a collection of 6366 polyphonic MIDI-encoded music pieces
Recommended from our members
Multimedia resource discovery
This chapter examines the challenges and opportunities of Multimedia Information Retrieval and corresponding search engine applications. Computer technology has changed our access to information tremendously: We used to search authors or titles (which we had to know) in library cards in order to locate relevant books; now we can issue keyword searches within the full text of whole book repositories in order to identify authors, titles and locations of relevant books. What about the corresponding challenge of finding multimedia by fragments, examples and excerpts? Rather than asking for a music piece by artist and title, can we hum its tune to find it? Can doctors submit scans of a patient to identify medically similar images of diagnosed cases in a database? Can your mobile phone take a picture of a statue and tell you about its artist and significance via a service that it sends this picture to?
In an attempt to answer some of these questions we get to know basic concepts of multimedia resource discovery technologies for a number of different query and document types: piggy-back text search, i.e., reducing the multimedia to pseudo text documents; automated annotation of visual components; content-based retrieval where the query is an image; and fingerprinting to match near duplicates.
Some of the research challenges are given by the semantic gap between the simple pixel properties computers can readily index and high-level human concepts; related to this is an inherent technological limitation of automated annotation of images from pixels alone. Other challenges are given by polysemy, i.e., the many meanings and interpretations that are inherent in visual material and the corresponding wide range of a user’s information need.
This chapter demonstrates how these challenges can be tackled by automated processing and machine learning and by utilising the skills of the user, for example through browsing or through a process that is called relevance feedback, thus putting the user at centre stage. The latter is made easier by “added value” technologies, exemplified here by summaries of complex multimedia objects such as TV news, information visualisation techniques for document clusters, visual search by example, and methods to create browsable structures within the collection
Recommended from our members
Adverse Drug Reaction Classification With Deep Neural Networks
We study the problem of detecting sentences describing adverse drug reactions (ADRs) and frame the problem as binary classification. We investigate different neural network (NN) architectures for ADR classification. In particular, we propose two new neural network models, Convolutional Recurrent Neural Network (CRNN) by concatenating convolutional neural networks with recurrent neural networks, and Convolutional Neural Network with Attention (CNNA) by adding attention weights into convolutional neural networks. We evaluate various NN architectures on a Twitter dataset containing informal language and an Adverse Drug Effects (ADE) dataset constructed by sampling from MEDLINE case reports. Experimental results show that all the NN architectures outperform the traditional maximum entropy classifiers trained from n-grams with different weighting strategies considerably on both datasets. On the Twitter dataset, all the NN architectures perform similarly. But on the ADE dataset, CNN performs better than other more complex CNN variants. Nevertheless, CNNA allows the visualisation of attention weights of words when making classification decisions and hence is more appropriate for the extraction of word subsequences describing ADRs
How reliable are annotations via crowdsourcing? a study about inter-annotator agreement for multi-label image annotation
The creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how different sets of annotations influence the ranking of systems and if these annotations can be obtained with a crowdsourcing approach. This study is applied to annotations of images with multiple concepts. A subset of the images employed in the latest ImageCLEF Photo Annotation competition was manually annotated by expert annotators and non-experts with Mechanical Turk. The inter-annotator agreement is computed at an image-based and concept-based level using majority vote, accuracy and kappa statistics. Further, the Kendall τ and Kolmogorov-Smirnov correlation test is used to compare the ranking of systems regarding different ground-truths and different evaluation measures in a benchmark scenario. Results show that while the agreement between experts and non-experts varies depending on the measure used, its influence on the ranked lists of the systems is rather small. To sum up, the majority vote applied to generate one annotation set out of several opinions, is able to filter noisy judgments of non-experts to some extent. The resulting annotation set is of comparable quality to the annotations of experts
High-dimensional visual vocabularies for image retrieval
In this paper we formulate image retrieval by text query as a vector space classification problem. This is achieved by creating a high-dimensional visual vocabulary that represents the image documents in great detail. We show how the representation of these image documents enables the application of well known text retrieval techniques such as Rocchio tf-idf and naíve Bayes to the semantic image retrieval problem. We tested these methods on a Corel images subset and achieve state-of-the-art retrieval performance using the proposed methods
Conservation of effort in feature selection for image annotation
This paper describes an evaluation of a number of subsets of features for the purpose of image annotation using a non-parametric density estimation algorithm (described in). By applying some general recommendations from the literature and through evaluating a range of low-level visual feature configurations and subsets, we achieve an improvement in performance, measured by the mean average precision, from 0.2861 to 0.3800. We demonstrate the significant impact that the choice of visual or low-level features can have on an automatic image annotation system. There is often a large set of possible features that may be used and a corresponding large number of variables that can be configured or tuned for each feature in addition to other options for the annotation approach. Judicious and effective selection of features for image annotation is required to achieve the best performance with the least user design effort. We discuss the performance of the chosen feature subsets in comparison with previous results and propose some general recommendations observed from the work so far
Recommended from our members
Method for merging subtitles
A method of electronically processing a broadcast subtitle signal so as to merge lines of subtitles and correct possible transmission errors allows subtitles to be rendered and/or stored in a form suitable for further text processing and, in particular, keyword searches. Applications of the method or computer program or system implementing the method include the creation, maintenance, indexing or searching of a multimedia library or database, in particular when the literary comprises broadcast television programmes and the corresponding subtitles
Mining multimedia salient concepts for incremental information extraction
We propose a novel algorithm for extracting information by mining the feature space clusters and then assigning salient concepts to them. Bayesian techniques for extracting concepts from multimedia usually suffer either from lack of data or from too complex concepts to be represented by a single statistical model. An incremental information extraction approach, working at different levels of abstraction, would be able to handle concepts of varying complexities. We present the results of our research on the initial part of an incremental approach, the extraction of the most salient concepts from multimedia information
- …